Segmented In-advance Data Analytics for Fast Scientific Discovery
نویسندگان
چکیده
Scientific discovery usually involves data generation, data preprocessing, data storage and data analysis. As the data volume exceeds a few terabytes (TB) in a single simulation run, the data movement, which happens during each cycle of the scientific discovery, continues to be the bottleneck in most scientific big data applications. A lot of research works have been conducted on reducing the data movement. Among the existing efforts and based on our previous research, reusing the analysis results shows a significant potential in optimizing the data movement between analysis operations. In this work, we propose the Segmented InAdvance (SIA) data analytics approach for optimizing the data movement and we also provide a cloud-based elastic distributed in-memory database to manage the intermediate analysis results. The fundamental idea of this Segmented In-Advance approach is to analyze the history operations and to predict the future interesting analytics operations. The predicted analysis operation is in-advance performed on the finer segmented dataset and the segmented results are distributed in an in-memory key-value store for future reuse. The evaluation shows that the segmented in-advance data analytics approach achieves 1.2X-6.1X speedup. The evaluation also shows a good scalability of the in-memory distributed data store. The proposed Segmented In-Advance data analytics approach is a promising data movement reduction solution for scientific big data applications and fast scientific discovery. Keywords-segmented in-advance data analytics, big data, data intensive computing, scientific computing
منابع مشابه
The Promise and Potential of Big Data: A Case for Discovery Informatics
The emergence of “big data” offers unprecedented opportunities for not only accelerating scientific advances, but also enabling new modes of discovery. While we understand how to automate routine aspects of data management and analytics, most elements of the scientific process currently require considerable human expertise and effort. We argue that realizing the full potential of data to accele...
متن کاملBig data analytics as a service: Exploring reuse opportunities
As data scientists, we live in interesting times. Data has been the No. 1 fast growing phenomenon on the Internet for the last decade. Big data analytics have the potential to reveal deep insights hidden by big data that exceeds the processing capacity of existing systems, such as peer influence among customers, revealed by analyzing shoppers' transactions, social and geographical data. In the ...
متن کاملAccelerating Science: A Computing Research Agenda
The emergence of “big data” offers unprecedented opportunities for not only accelerating scientific advances but also enabling new modes of discovery. Scientific progress in many disciplines is increasingly enabled by our ability to examine natural phenomena through the computational lens, i.e., using algorithmic or information processing abstractions of the underlying processes; and our abilit...
متن کاملScientific Discovery and Engineering Innovation Requires Unifying Traditionally Separated High- Performance Computing and Big Data Analytics. Exascale Computing and Big Data
NEARLY TWO CENTURIES ago, the English chemist Humphrey Davy wrote “Nothing tends so much to the advancement of knowledge as the application of a new instrument. The native intellectual powers of men in different times are not so much the causes of the different success of their labors, as the peculiar nature of the means and artificial resources in their possession.” Davy’s observation that adv...
متن کاملBig Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions
The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...
متن کامل